Simulation and Optimization of HPC Job Allocation for Jointly Reducing Communication and Cooling Costs
نویسندگان
چکیده
Performance and energy are critical aspects in high performance computing (HPC) data centers. Highly parallel HPC applications that require multiple nodes usually run for long durations in the range of minutes, hours or days. As the threads of parallel applications communicate with each other intensively, the communication cost of these applications has a significant impact on data center performance. Energy consumption has also become a first-order constraint of HPC data centers. Nearly half of the energy in the computing clusters today is consumed by the cooling infrastructure. Existing job allocation policies either target improving the system performance or reducing the cooling energy cost of the server nodes. How to optimize the system performance while minimizing the cooling energy consumption is still an open question. This paper proposes a job allocation methodology aimed at jointly reducing the communication cost and the cooling energy of HPC data centers. In order to evaluate and validate our optimization algorithm, we implement our joint job allocation methodology in the Structural Simulation Toolkit (SST) – a simulation framework for large-scale data centers. We evaluate our joint optimization algorithm using traces extracted from real-world workloads. Experimental results show that, in comparison to performance-aware job allocation algorithms, our joint optimization algorithm achieves comparable running times and reduces the cooling power by up to 42.21% across all the jobs.
منابع مشابه
Communication and cooling aware job allocation in data centers for communication-intensive workloads
Energy consumption is an increasingly important concern in data centers. Today, nearly half of the energy in data centers is consumed by the cooling infrastructure. Existing policies on thermally-aware workload allocation do not consider applications that include many tasks (or threads) running on a large set of nodes with significant communication among the tasks. Such jobs, however, constitut...
متن کاملAdaptive Cost Optimization and Fair Resource Allocation in Computational Grid Systems
Grid computing systems offer large scale computing resources and can help carry out computation intensive jobs with improved efficiency and reduced business costs. Due to the heterogeneity of the computation and communication resources in these grid systems, efficient allocation of user jobs to resources is essential for reducing the execution time and costs. In this paper, we study an adaptive...
متن کاملJoint Allocation of Computational and Communication Resources to Improve Energy Efficiency in Cellular Networks
Mobile cloud computing (MCC) is a new technology that has been developed to overcome the restrictions of smart mobile devices (e.g. battery, processing power, storage capacity, etc.) to send a part of the program (with complex computing) to the cloud server (CS). In this paper, we study a multi-cell with multi-input and multi-output (MIMO) system in which the cell-interior users request service...
متن کاملJointly power and bandwidth allocation for a heterogeneous satellite network
Due to lack of resources such as transmission power and bandwidth in satellite systems, resource allocation problem is a very important challenge. Nowadays, new heterogeneous network includes one or more satellites besides terrestrial infrastructure, so that it is considered that each satellite has multi-beam to increase capacity. This type of structure is suitable for a new generation of commu...
متن کاملOn Joint Sub-channel Allocation, Duplexing Mode Selection, and Power Control in Full-Duplex Co-Channel Femtocell Networks
As one of the promising approaches to increase the network capacity, Full-duplex (FD) communications have recently gained a remarkable attention. FD communication enables wireless nodes to simultaneously send and receive data through the same frequency band. Thanks to the recent achievements in the self-interference (SI) cancellation, this type of communication is expected to be potentially uti...
متن کامل